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ABSTRACT 



A great deal of study has been conducted in the last ten 

years concerning the use of voice recognition equipment with 

computers. It was hoped that its use would reduce the 

required entry time and error rate, and improve the man- 

machine interface between the user and the computer. 

There are many potential applications for such voice 

recognition use in the military, and specifically in the area 

3 

of Command, Control and Communications (C ) . War games are 

3 

often used today to test the effectiveness of C technologies, 
and WES is one such war game. 

This paper will assess the feasibility of using voice 
recognition equipment to run WES by comparing the results of 
an experiment employing both voice and manual typing input 
modes. The results show that in this particular task typing 
does a somewhat better job than the buffered voice mode, while 
unbuffered voice has very poor results. 



4 



TABLE OF CONTENTS 



I. INTRODUCTION 7 

A. A BRIEF HISTORY OF VOICE TECHNOLOGY 7 

1. General Background 7 

2. Some Past Uses of Voice Recognition 

Systems 9 

B. STUDY AND TESTING OF VOICE RECOGNITION 

SYSTEMS 11 

C. POSSIBLE MILITARY USES OF VOICE RECOGNITION 

SYSTEMS 14 

II. BACKGROUND 17 

A. VOICE RECOGNITION IN C 3 17 

B. WAR GAMING/SIMULATIONS FOR MEASURING C 3 

EFFECTIVENESS 19 

1. Manual and Voice Inputs for Games 20 

C. DESCRIPTION OF THE WARFARE ENVIRONMENTAL 

SIMULATOR 21 

III. EXPERIMENTAL DESIGN 25 

A. CONCEPT OF THE EXPERIMENT 25 

B. EQUIPMENT USED 26 

1. Hardware Description 26 

2. Available Input Modes 30 

C. SELECTION OF SUBJECTS 32 

1. Backgrounds 32 

2. Initial Training 33 

D. CONDUCTING THE EXPERIMENT 35 



5 



IV. PRESENTATION OF DATA 39 

A. DATA COLLECTION TECHNIQUES 39 

B. GENERAL RESULTS -40 

C. RESULTS FOR SCENARIO TIMES -42 

D. RESULTS FOR INPUT ERRORS 48 

1. Recognition Errors 48 

2. Operator Errors 52 

3. Total Errors 59 

V. CONCLUSIONS AND RECOMMENDATIONS 65 

A. EXPERIMENTAL CONCLUSIONS 65 

B. RECOMMENDATIONS FOR FURTHER STUDY 66 

APPENDIX A: VOICE STUDIES AT NPS 69 

APPENDIX B: WES VOCABULARY 70 

APPENDIX C: TEST SUBJECTS' BACKGROUNDS 74 

APPENDIX D: TYPING ABILITY TEST 75 

APPENDIX E: LIST OF 20 WES COMMANDS 77 

APPENDIX F: FIVE LISTS OF FOUR WES COMMANDS 79 

APPENDIX G: INSTRUCTIONS FOR SUBJECTS 81 

APPENDIX H: PRACTICE VOICE COMMANDS 8 3 

LIST OF REFERENCES 84 

INITIAL DISTRIBUTION LIST 86 



6 



I. 



INTRODUCTION 



The cost of computer hardware has dropped dramatically 
in recent years, and the use of computers throughout our 
society has skyrocketed to help us manage the glut of data 
which we are often presented and to solve the increasingly 
more complex problems of the present and future. Histor- 
ically, data has been entered into the computer by keypunch, 
which can be slow, monotonous and error-filled for all but 
the very well-trained. Researchers have looked for a better, 
more efficient man-machine interface than the keypunch, and 
as early as the 1950's they realized that the most natural 
type of communication which we as humans use is speech. So 
why not simply speak to a computer as you would to a fellow 
worker and have the computer perform whatever task you have 
directed? 

A. A BRIEF HISTORY OF VOICE TECHNOLOGY 
1. General Background 

Voice recognition systems have received quite a bit 
of interest since the 1950's, mainly during the past fifteen 
years. Automatic speech recognition, per se, is concerned 
with automatically determining linguistic messages spoken to 
the voice recognizer by comparing them to acoustic data stored 
in the recognition system. Both industry and the military 
have decided to study the feasibility of incorporating 
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interactive voice recognition systems in their computer 
operations in order to have a more natural interface with the 
computer, to increase the speed of data entry and retrieval 
and thereby increase throughput, and to lower the input error 
rate. A voice recognition system, using one's natural lan- 
guage, would certainly seem to have the potential for reduc- 
ing errors at the man-machine interface. In addition, the 
higher-level personnel in industry and the military, those 
specifically who must make the important decisions and who 
most need the decision-making aid of the computer, are those 
least likely to sit at the keyboard and use the computer. 

So it was thought that voice interaction would help these 
high-level personnel become more direct users of the systems 
on which they depend. 

Interactive voice recognition systems (i.e., those 
which give either a vocal or a displayed response to a verbal 
input) can be basically divided into two categories: 
isolated word recognizers and continuous speech recognizers. 
Isolated word recognition systems were the first type 
developed and by far the easier of the two to engineer and 
construct. An isolated word recognizer, with a limited vocab- 
ulary of x number of words or utterances (short phrases) , 
must simply recognize the utterance spoken to it and respond 
as programmed. This recognition is accomplished by "training" 
the system prior to its use. Anyone who will be using the 
system trains it by repeating the various vocabulary words a 
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number of times, usually between five and ten, with different 
inflection, stress, pronunciation, etc., while in a "training 
mode." The parameters of the pronunciation of each utterance 
are then averaged and stored in the digital speech processor 
memory of the system. Then when a word or phrase is spoken 
to the recognition system, its parameters are compared 
digitally to all those stored in memory and hopefully a match 
is found and the proper response made by the computer [1] . 

While this indeed sounds like a complex process, con- 
sider what the continuous speech recognition system must do. 

In addition to all the above it must be able to recognize 
word sequences and digit strings. It must be able to find 
boundaries between words, or segment the utterance either 
explicitly or implicitly by trying to fit together sequences 
of word pronunciations before the final classification 
process [2,3]. It is difficult to analyze the beginnings and 
endings of words unless adjacent words are known; it is much 
easier to recognize words spoken in isolation, or separated 
by short pauses, than those with no pauses between them. 
However, it is very unnatural for humans to pause after speak- 
ing each word in a sentence, and although the first isolated 
word recognition systems have been in use since 1972, further 
study into advanced systems has continued. 

2 . Some Past Uses of Voice Recognition Systems 

Beginning in 1972, there have been several successful 
uses of interactive voice recognition systems in industrial 
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settings. These have been strictly isolated word systems up 
to this time. It has been found that by using voice systems 
to interact with a computer, a worker's hands and eyes are 
both free to continue their tasks. It is thereby possible 
to increase the speed of data entry by the worker not having 
to stop what he is doing, write down or directly enter data 
and then return to where he previously left off. Voice also 
cuts down on the number of errors often encountered in this 
process or in other processes where the first worker must 
relay information to a second worker who then enters what he 
heard (perhaps incorrectly) into the data system. 

Airlines were the first to use voice recognition to 
input data to a computer for the correct routing of baggage 
to various aircraft. It was found very efficient to allow 
the baggage handler to input data by voice, freeing his hands 
and eyes to look at and handle the pieces of luggage. Banks 
have been able to accomplish paperless transfers of funds, 
dividends, retirement payments and the payments of bills by 
simply speaking the dollar amount to be transferred to the 
voice recognition system. Quality assurance checks on 
manufactured goods have been greatly simplified and speeded 
up in many cases by allowing inspectors to use their hands 
and eyes for the inspections while simultaneously inputting 
data to a computer by voice. In addition to these few 
examples of discrete speech recognition there are many other 
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areas where voice recognition systems either are currently 
being used, or could easily be used in the future [1,4], 

B. STUDY AND TESTING OF VOICE RECOGNITION SYSTEMS 

Although discrete speech recognition systems have been, 
and are in use, research has continued on both the discrete 
and continuous speech systems. Probably the largest such 
study undertaken to date has been the Advanced Research 
Projects Agency (ARPA) five year $15 million Speech Under- 
standing Research (SUR) project begun in 1971. This project 
was designed to provide a breakthrough in the handling of 
spoken sentences, by the use of higher-level linguistic 
information and specific task-dictated constraints on what 
could be said [5] . It was thought that this was an important 
project because of increased industrial interest in speech 
recognition, government interest in future programs, the work 
of several foreign countries in the field, and projected 
future widespread applications. In 1978 the projected ten 
year sales of 2.5 million speech processing units ($4.8 
billion) seemed to lend a qreat deal of credence to these 
points [5] . 

The SUR project was concerned with understanding as 
opposed to simply word recognition. By understanding was 
meant having the system interpret an utterance and respond 
correctly. The project was designed to be highly task 
oriented, and to have speech analyzed and interpreted in 
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the context of a task, rather than interpreting each word or 
component of the utterance individually [3,6]. Other goals 
included a working vocabulary of 1,000 words for the system 
and accuracy of 90 percent averaged over several different 
speakers . 

The SUR project was designed to develop several inter- 
mediate "throw-away" systems rather than to work toward one 
carefully designed ultimate system. With this in mind there 
were five main system contractors and four specialist contrac- 
tors engaged in the research at the start of the SUR project. 
The five main contractors were Bolt, Beranek and Newman (BBN) ; 
Carnegie-Mellon University; Lincoln Laboratory; System Develop- 
ment Corporation; and SRI International. The four specialist 
contractors were Haskins Laboratory; Speech Communications 
Research Laboratory; Univac; and the University of California 
at Berkeley [7] . 

Approximately one-half way through the five year project 
three systems which seemed to be farthest along in meeting 
ARPA's goals were selected to continue the project. When SUR 
ended in September 1976 it was generally agreed that it had 
greatly advanced the state of the art in continuous voice 
recognition and that cost-effective speech input was a 
plausible scientific and technical goal [6] . One of the 
final three systems called HARPY, developed by Carnegie- 
Mellon University, met all of ARPA's initial goals. Using 
a vocabulary of 1,011 words and five different speakers, 
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HARPY achieved a total sentence accuracy (i.e., all words 
correct) and semantic accuracy (i.e., correct response ac- 
curacy) of over 90 percent for the specific task of document 
retrieval [5,6]. The other two systems tested, HWIM (for 
Hear What I Mean, by BBN) and HEARSAY II (by Carnegie-Mellon) 
fell somewhat short of the stated objectives. 

Another more recent study of voice technology was done 
for the Rome Air Development Center, Rome, New York. In this 
project the use of voice systems to input cartographic data 
for the Defense Mapping Agency Aerospace Center was studied. 

It was found that voice input was fast, more accurate and 
easier to use than the paper, pencil and keypunch that were 
presently in use. In addition, the voice system eliminated 
the need for skilled typists to interact with the computer. 

It was found that the speed of data entry for inexperienced 
personnel was much higher for voice than for those at a key- 
board who were not skilled typists, indicating that much less 
training was required to operate the voice recognition system 
than to skillfully use the keyboard [8] . For this particular 
task, and for others as well, since voice is the most natural 
mode of communication, it was hoped that its performance level 
would be higher than manual input with a minimum of training. 

A final example of a recent voice recognition study [9] , 
carried out at the Naval Postgraduate School (NPS) , compared 
the uses of manual and voice inputs to run a distributed 
computer network. Using twenty-four military officers as 



13 



subjects operating on the ARPA Network, and using a fixed 
scenario of instructions, it was found that voice input - 
again with minimal voice practice - was 17.5 percent faster 
than manual typing input, and manual input had 183.2 percent 
more entry errors than did the voice input. It is presumed 
that an even greater difference would have been recorded had 
experienced voice input subjects been used. 

C. POSSIBLE MILITARY USES OF VOICE RECOGNITION SYSTEMS 

The military is also carefully studying the use of voice 
interactive systems for many varied applications. The author 
has encountered several possible Navy applications which are 
prime candidates for voice recognition use. In the area of 
tactical data systems, normally data has been directly entered 
from remote sensors or by an operator at a keyboard, and then 
either acted upon or retrieved by the operator. Voice systems 
can greatly facilitate the operator's data entry or retrieval 
by allowing him to interact vocally with the system rather 
than requiring a skilled typist at the keyboard. This should 
reduce the time needed for interaction and the possibility of 
many errors [6] . 

A study at NPS addressed the possibility of using a voice 
recognition system as the interface between a ship's Tactical 
Action Officer (TAO) and the Naval Tactical Data System (NTDS) 
computer in order to reduce reaction time. This study also 
postulated the use of a voice synthesizer to output the 
information requested from the computer. The authors felt 
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that there is an incompatibility between a discrete speech 
system and other communications which the TAO uses. During 
a period of tension, it might be difficult to use discrete 
speech with one system and continuous modulation on others. 

It was also felt that a discrete speech recognition system 
would not be compatible with the rapid pace of a TAO's 
duties [10] . Further study should be done in this area. 

Naval aviation is a field where there are a great many 
possibilities for the use of voice systems. One study [11] 
reported investigating the feasibility of using a Voice 
Recognition and Synthesis (VRAS) system with the Advanced 
Integrated Display System (AIDS) on Navy aircraft. VRAS, a 
software package of real-time voice processing routines, when 
used with the AIDS cockpit information system would provide a 
much improved man-machine interface between the pilot and the 
onboard computer. The voice interactive system in this case 
could handle complex tasks encountered in an airborne environ- 
ment and could free the eyes and hands of the pilot for other 
tasks. Some possible uses would include selecting a missile 
verbally vice manually, and having this confirmed verbally, 
thereby allowing the pilot to fly a better intercept profile. 
The system could be used for reporting (e.g., "report air- 
speed") , data entry, systems checks where VRAS reports when 
a checklist is complete, and so on. It is thought that this 
might help reduce the clutter of instrumentation and fault 
warning displays in the aircraft. In addition, it was even 
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postulated that a speech recognition system together with an 
adequate display system could substitute for a second man in 
an F-14/A-6 type aircraft. It could save space, and reduce 
weight, fuel consumption, manpower, training and the life 
cycle costs of an aircraft [6] . 

Other military areas where voice recognition systems 
could be used might include command centers, combat informa- 
tion centers on board Navy ships, inputs for weapons fire 
control systems, and air traffic control. Very interesting 
and relevant research is presently being done at NPS on the 
possibility of using voice systems for the military photo 
interpreter and for use with the Joint Chiefs of Staff (JCS) 
Emergency Action Message (EAM) system. Appendix A lists voice 
recognition studies which have been, or are presently being 
conducted at NPS . 

Although a good deal of research has been done on the 
feasibility and design of interactive voice recognition sys- 
tems, much is yet to be done. For instance, how do you improve 
the acoustic phonetic analysis ability of a system so that it 
is able with a high degree of accuracy to understand continuous 
voice commands from a large number of people? Is there really 
even a need for continuous voice recognition systems? They 
would certainly be nice, and they are much more "natural" than 
isolated word systems for a human user, but what is the op- 
portunity cost of developing them? These questions are now 
being answered and will be answered in the future, thanks in 
great part to the impetus of the SUR project. 
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II. BACKGROUND 



The area of Command, Control and Communications, or C 3 , 
has been an integral part of human existence since the begin- 
ning of civilization, although it has gone by different titles 
and has had slightly different shades of meaning. There is a 

great deal of difficulty even now in defining and quantifying 

3 

this "new" area of C . It is definitely a process, it in- 
volves equipment and individuals, and also goals or missions. 

3 . 

To this author C is a process or means by which a military 
commander (or civilian authority) exercises authority and 
direction in allocating scarce resources (e.g., money, troops, 
ships, etc.) in order to achieve organizational goals in the 
most efficient manner possible. 

A. VOICE RECOGNITION IN C 3 

In his action of directing or allocating resources, in 

3 

performing the vital elements of C , the commander must inter- 
act with individuals and equipments . Several of the military 
examples of speech recognition study in Section I fall within 
this area of C 3 . These examples included the TAO-NTDS inter- 
face, use of voice recognition in a command center or CIC and 
use of voice recognition by a pilot in the cockpit of an 
aircraft. Each of these certainly depicts a command and 
control situation where voice recognition systems might be of 
use. Additionally, the example [9] of the increased speed of 
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input and lower error rates provided by voice input while 
controlling a distributed computer network certainly points 
to the possible use of voice for purposes. 

Several features make speech recognition potentially very 

3 

useful in the area of C . It is felt that there will be a 
closer coupling of the commander with the system he depends on 
when using speech inputs. Most commanders would never tie 
themselves down to a keyboard during any crisis or battle 
situation. With the use of speech recognition and a wireless 
microphone there would not be this feeling of being tied 
down. There would also be more centralization of control in 
a crisis situation. This would result in increased speed of 
interaction with the system, and a more effective use of the 

new support technology available [6] . 

3 

In a C environment, voice systems could certainly be 
used for data input and retrieval. A Task Force commander 
would directly use such a system for information management 
and evaluation, as an aid in decision making, and for decision 
dissemination. The closer a commander can be to the system 
upon which he bases his decisions, the better the quality of 
his decisions should be, with greater avoidance of serious 
error. Command language also is of limited complexity with a 
rather large vocabulary to cover many possibilities, and this 
should suit it well to a voice recognition system. 
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B. WARGAMING/SIMULATIONS FOR MEASURING C 3 EFFECTIVENESS 

One of the major problems in the C 3 arena has been how to 

• 3 3 

measure the effectiveness of a C system. Since C must func- 
tion in distinctly different conditions (e.g., peacetime, 
periods of crisis, conventional or nuclear war) this becomes 
increasingly more difficult. How does one gauge or measure 
whether a Command and Control system will function in a 
nuclear war? More importantly, perhaps, is whether the system 
will function in those transition times between each of these 
major conditions. 

It is certainly not sufficient to measure effectiveness 

by simply comparing the "output" of one system with that of 

another. For example, for a new communications system simply 

having a higher message handling rate or a lower bit error 

rate than an existing system does not necessarily improve the 
3 

C capability. The effectiveness of a system in improving the 
chances of victory in battle, or for achieving organizational 
goals, makes it a better C 3 system. Since it is often not 
possible to test a system under such conditions, simulations 
and models are often used. 

War games are a type of simulation frequently used by the 

3 

military to evaluate C effectiveness. Through the use of a 
war game evaluators and commanders can determine with a great 
deal of accuracy the effectiveness of present and proposed 
C 3 technologies under simulated warfare conditions. Such 
war games often allow for replication so that a scenario 
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. 3 

basically can be replayed using different C strategies in 
order to evaluate the effectiveness of one system as opposed 
to another. War games are a very cost-effective means of 
running such an evaluation under realistic conditions using 
experienced players. 

1 . Manual and Voice Inputs for Games 

The most realistic war games today, those which are 
able to be run at a near real-time speed, which are able to 
enter and disseminate a large volume of sensor and fire con- 
trol data, and which are able to regularly and quickly update 
displays are either computer-assisted or computer-run war 
games. Manual war games, although generally no less accurate 
than computer-assisted games, are usually very slow moving, 
require many extra participants to record data and often 
quickly become monotonous and tedious. In a computer-assisted 
war game commands are generally input at a keyboard as is 
usually the case for most other computer- type functions, as 
previously noted. It is certainly plausible to consider using 
voice input devices to run such war games. 

3 

If war games are to be used to evaluate C effective- 
ness, one facet of such an evaluation certainly could be any 
increase in effectiveness provided by a voice recognition 
system as opposed to conventional manual input. In fact a 
war game can be used as a vehicle for testing the concept of 
using voice recognition equipment in any number of other 
military applications where high speed of input and low error 
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rates are necessary. It was with this thought in mind that 
the author decided to develop and conduct an experiment com- 
paring the use of voice and manual inputs to run a Naval war 
game. The author chose the CINCPACFLT version of the Warfare 
Environmental Simulator (WES) as the war game to use in this 
experiment. WES was chosen mainly because it is easily acces- 
sible from the NPS Remote Site Module (RSM) and because the 
author was already somewhat familiar with its operation. 

C. DESCRIPTION OF THE WARFARE ENVIRONMENTAL SIMULATOR [12] 

The Warfare Environmental Simulator (WES) is a computer- 
assisted war game which runs on a DEC KL-2040 or a PDP-10 
computer at the Naval Ocean Systems Center (NOSC) , San Diego, 
California. WES is a two-sided interactive game in which 
Blue and Orange sides can define, structure and control their 
own forces. The game is strictly a Naval war game which 
employs approximately 80 player commands to control the plat- 
forms and sensors engaged in the game. 

Each command position in a WES game contains a graphics 
terminal situation display, an alphanumeric terminal present- 
ing status board displays and another alphanumeric terminal 
for input of player commands. This player terminal acts as 
both an input and an output terminal. While the system is in 
the input mode output messages are queued. The color graphics 
display is driven by a PDP-11/70 which is interfaced with 
NOSC's KL-2040 or PDP-10 via the ARPANET. WES operates under 
either the TOPS-20 or TENEX systems. 
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The WES game is a combination of three major processes, 
called BUILD, FORCE and WARGAM. Each of these is an integral 
part of the war game and must be initialized and used prior 
to and during game play. The BUILD process is used to create 
and modify a database of game objects such as ships or shore 
bases. With BUILD a player may add, delete or modify a file 
of game objects in the database. This will normally be done 
prior to game play when determining the forces needed for the 
game. The database contains values for ship classes, shore 
bases, aircraft types, missiles, sensors and weapons. 

The FORCE process creates the actual game scenario to be 
used. With FORCE game objects from BUILD files are organized 
into task hierarchies for use in the game. FORCE specifies 
the actual names and classes of ships, their initial locations, 
courses and speeds along with any associated aircraft, sensors 
and weapons. FORCE allows a player to create new game scena- 
rios, to modify a scenario, to change numeric parameters or 
to input or delete items from a scenario. Contingency plans 
which might be used during a game can also be created and 
entered into the specific game database by using FORCE. 

WARGAM actually runs the interactive game based on the 
chosen scenario and the commands input by the players . Once 
initiated it responds to player commands, generates both the 
graphics and the status board displays and updates these 
displays each game minute. The WES graphics display at NPS 
uses a GENISCO display processor/CONRAC CRT to display in 
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color the graphic situation display. This display includes 
grid tick marks, background maps, NTDS symbology for friendly, 
neutral and enemy tracks, lines of bearing for passive sensors, 
weapons envelopes and game time. 

Six alphanumeric status board displays are controlled by 
WARGAM and are shown on a user terminal one at a time. The 
player controls the status board functions by depressing 
appropriate keys at the terminal. The six status board dis- 
plays include the following: active track status, passive 

track (ESM) status, friendly ship status, friendly air status, 
friendly shore bases status and flight status. These displays 
then contain all the status information which one would 
expect to find in the CIC on a surface ship. 

The WES commands which players use to control the war game 
are highly formatted in terms of syntax and input parameters . 
Two types of errors are possible when inputting a command. 
First, the syntax may be incorrect. In this case an immediate 
warning is issued on the terminal saying that the command can- 
not be parsed. This should alert the player to check his 
command and then reenter it correctly. Second, a command 
might order some impossible action (e.g., addressing a ship 
not in the game) . No immediate warning is issued in this case 
since the order parses correctly. However, when execution of 
the order is attempted it cannot be carried out and this fact 
is displayed on the terminal for the player. When an order 
is entered correctly, the system responds that the order has 
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been entered; this indicates that the order was parsed and 
sent on for execution, but not that there is no possible 
discrepancy in the order (as noted in the second error case 
above) . 

It was with this game of WES as described above that the 
author conducted his voice/manual input experiment. The 
details and background of the experiment are described in 
Section III and its results in Section IV following. The 
conclusions drawn from the data collected address the feasibil- 
ity of using an automatic voice recognition system to run 
computer-assisted war games in general, and WES in particular. 
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III. EXPERIMENTAL DESIGN 



A. CONCEPT OF THE EXPERIMENT 

The basic goal of this experiment was to test the feasibil- 
ity of operating WES by using voice inputs rather than the 
customary manual inputs. This would be accomplished by having 
a number of test subjects individually enter valid WES com- 
mands for BLUE forces while the game was running, recording 
the time necessary to successfully enter the commands and the 
number of errors committed with voice and manual input, and 
then analyzing the data to see whether one entry method was 
superior to the other. Although the WES game would be run- 
ning, the only commands entered would be for the BLUE forces 
and therefore there would be no interaction between BLUE and 
ORANGE, or actual "game play." BLUE-ORANGE interaction was 
not considered necessary for the goals of this experiment. 
However, it was considered important to have WES running dur- 
ing the experiment, rather than having the subjects merely 
type out the WES commands or speak them to a voice recognizer, 
so that the actual interaction with the WES input/output 
player terminal would be accomplished as in a two-sided war 
game. 

In order to run WES, as noted in Section II, game forces 
must be assigned and a scenario established. The author chose 
to use an existing WES scenario with its associated forces 
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for the experiment. The CUBA scenario was chosen due to its 
relative simplicity and yet entirely adequate forces for the 
experimental goals. In this scenario three United States war- 
ships, aircraft carrier ENTERPRISE, guided missile destroyer 
BERKELEY and nuclear submarine STURGEON are opposed by three 
Soviet warships and one merchant ship in a setting similar to 
the 1962 Cuban missile crisis. The test subjects would com- 
mand the ships and forces of the BLUE task force by using a 
fixed series of commands provided to them. 

It was necessary to establish a basic vocabulary which 
the subjects would use to enter the player commands to WES. 
This vocabulary had to be complete enough to allow formula- 
tion of any of the WES commands [12] which might be necessary 
during play of a game. The vocabulary had to contain all the 
scenario specific words (e.g., ENTERPRISE, BERKELEY) which 
might become necessary in order to command those BLUE forces 
in the CUBA WES scenario. Also, the vocabulary had to be 
compatible with both the voice and keyboard methods of entry. 
The vocabulary which was used is considered sufficient to run 
any basic WES game involving the forces in the scenario. The 
total vocabulary amounted to 162 words or short phrases 
(Appendix B) . 

B. EQUIPMENT USED 

1. Hardware Description [13] 

For the experiment a Threshold Model T600 discrete 
utterance voice recognition unit manufactured by Threshold 
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Technology, Inc. was used. The T600 is an electronic speech 
recognition device which automatically recognizes utterances 
of up to two seconds in duration. These utterances can be of 
several words in length as long as they do not exceed this 
time duration. Since it is a discrete, or isolated speech 
recognition unit there must be a short pause (at least .1 
second) between utterances. The T600 allows up to 256 
separate voice utterances to be stored in memory. As noted 
above, 162 utterances were the total vocabulary for this 
experiment. 

The Model T600 terminal used in this experiment con- 
sists of an analog speech preprocessor, microcomputer, 
CRT/keyboard unit, magnetic tape cartridge unit, remote voice 
input unit and noise-cancelling microphone. The speech pre- 
processor and microcomputer are contained in a main terminal 
processor unit. The speech preprocessor accepts spoken input 
from the remote voice input unit, extracts speech parameters 
and converts these to digital signals which are then processed 
by the microcomputer. The microcomputer compares these input 
signals with stored reference patterns to determine which 
vocabulary words were spoken. The reference patterns for all 
the vocabulary are established during a training phase when 
the user trains the voice recognizer by repeating each of the 
vocabulary utterances ten times. If a close match is found 
between an input speech utterance and a reference vocabulary 
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pattern, the utterance is "recognized" by the T600. It then 
sends to the user's computer the appropriate output string 
of characters associated with the recognized input. 

The T600 has three types of memory which the user 
may modify: speech reference patterns, prompt character 

strings and output character strings. As noted above the 
speech reference patterns are formed when the user trains the 
voice recognizer by repeating the vocabulary utterances a 
number of times. The prompt character strings are input by 
a user at the keyboard and are displayed on the CRT for each 
utterance to prompt the speaker when he is training that 
particular utterance. The output character string, also 
initially entered via the keyboard, is the actual output 
sent to the user's computer over a communications interface 
by the T600 when an utterance is recognized. The recognized 
utterances are sent exactly as if they had been typed in at 
the keyboard. When spoken each of the utterances is echoed 
on the CRT as a visual display for the operator. 

The speaker uses a noise-cancelling microphone plugged 
into the remote voice input unit while speaking to the T600. 
This microphone allows the T600 to be used in noisy areas. 

The placement of the microphone by the speaker is very impor- 
tant during both the training and recognition phases with 
the T600. Accurate recognition may decrease if the microphone 
is moved from one position to another in relation to the 
speaker's mouth. It should be placed in front of the lips 
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but not touching them, and slightly to the side of the 
speaker's mouth. The microphone should just touch the lower 
lip when the lip is extended forward as far as possible. If 
the microphone slips from this position while speaking, it 
should be readjusted before continuing. 

Data in the T600 memory is stored in the main terminal 
processor unit. In conjunction with this the magnetic tape 
cartridge unit, a digital tape recorder, is used to store 
this memory data on a tape cartridge and then to recall it 
from the cartridge whenever desired. The tape, once recorded, 
can be used to quickly retrain the terminal with the user's 
speech patterns and specific vocabulary. This is very useful 
when the terminal is used repeatedly by a number of different 
users . 

For this experiment two additional pieces of equipment 
were connected in parallel with the T600 described above. An 
ADM 31 Data Display Terminal with print much smaller than 
that of the T600 was used so that the longer commands input 
by the user would entirely fit on a single line rather than 
"wrapping around" as they would on the T600 CRT. It was felt 
that this would eliminate one possible source of confusion 
for the test subjects. Additionally, a Miniterm Model 1203 
was used in order to obtain a hard copy printout of all the 
voice and manual input commands. This was necessary to 
accurately count and differentiate between the types of input 
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errors. This will be discussed at greater length in Section 
IV. The entire equipment set-up as used in the experiment is 
shown in Figure 1. 

2 . Available Input Modes 

The speed of entry and number of errors associated 
with three different input modes were to be evaluated in the 
experiment. Each subject would type the BLUE player commands 
at the ADM 31 terminal, would enter the same commands using 
the unbuffered voice mode of the T600 and would enter the 
commands via the T600's buffered voice mode. The order of 
the input modes was varied from subject to subject in order 
to eliminate any bias which the ordering might have introduced. 

In the typing mode with WES there is no way of cor- 
recting any error once it is typed prior to sending it to the 
game for execution (i.e., no backspace or erase). This is 
quite important since a single error will invalidate an entire 
WES command. If an error is made it is best to immediately 
type a carriage return (entering the incorrect order) , and 
then retype the order correctly and enter it into the system. 

By doing this time is saved which would otherwise be wasted 
while completely entering a command already containing an 
error, and the possibility of committing further errors in 
this same command is eliminated. 

The unbuffered voice input mode to WES is very 
similar to this. The T600 will send the ASCII character 
stream associated with any recognized voice input to the 
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user's host computer without the user being able to correct 
any "incorrectly recognized" spoken input. No editing is 
possible in the unbuffered voice mode of operation, and there- 
fore, like typing, when an error is noted it is best to enter 
the command at that point and then reenter it correctly. In 
contrast to this the T600's buffered voice mode allows the 
user to verify his input stream, make corrections to it if 
necessary and then transmit it to the host computer. The 
T600 stores the utterances in an internal buffer which may 
be modified and the contents of this buffer are sent to the 
host in a "block" when the user transmits them. 

C. SELECTION OF SUBJECTS 
1 . Backgrounds 

Twelve subjects who participated on a voluntary basis 
were chosen for the experiment. Eleven of the subjects are 
military officers (six Navy, four Air Force and one Army) in 
paygrades 03 - 05, and one is a civilian professor at NPS. 

Ten of the military officers are members of the Command, 

3 

Control and Communications (C ) curriculum at NPS and the 
eleventh is on the faculty. Two of the twelve subjects 
are female Naval officers. 

All subjects had previously had at least a brief 
exposure to WES while at NPS. However, only one subject, 
the female faculty member, was considered to be experienced 
with WES. In addition, all the subjects had at least minimal 
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experience with voice recognition systems, with six of the 
subjects considered "experienced" with voice systems. This 
experience was established for four subjects by participating 
in a six month controlled voice recognition longitudinal 
study, and for the two faculty members by continuous use of 
voice systems over a prolonged period of time (more than 
three years for the civilian professor) . This breakdown of 
six experienced subjects and six inexperienced with voice 
recognition systems was planned in order to determine whether 
prior experience would be a significant factor in determining 
the preferred method of command input to WES. A synoptic 
background of the twelve test subjects is contained in 
Appendix C. 

2 . Initial Training 

Each of the subjects met individually with the author 
and was given a typing ability test. This consisted of a 
five minute typing exercise (similar to that given to a GS-2 
typist) during which the subject was instructed to type two 
given paragraphs totalling 21 lines as quickly and accurately 
as possible without error correction. A subject's speed in 
words per minute (wpm) was then calculated with a scoring 
table approximately using the formula wpm = total characters/ 
25. A certain number of errors, increasing with the number 
of gross words per minute typed, was permitted, with any 
errors in excess of this number resulting in .2 wpm per error 
subtracted from the final typing speed. 
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The typing ability test was given to determine whether 
there was a clear cut distinction between typists and non- 
typists among the test subjects. Although one subject typed 
below 20 wpm and two subjects were above 40 wpm, nine of the 
subjects were grouped between 21 and 39 wpm. Due to this 
close grouping and the rather short length of the WES com- 
mands this difference in typing speeds was not considered 
important. The typing test used/ along with its scoring 
matrix, is shown in Appendix D. 

Each of the subjects next trained the T600 voice 
recognition unit using the WES vocabulary of Appendix B. 

This training was accomplished by having the subjects repeat 
each vocabulary utterance ten times while in the T600 train- 
ing mode in order to optimize the stored reference patterns 
for their individual speech variations. The average time 
required to train the 162 utterance vocabulary was 94 minutes, 
with the shortest time being 69 minutes and the longest 116 
minutes . 

Once the training was completed each utterance was 
repeated three additional times while in the T600's recogni- 
tion mode to check for recognition accuracy. If at least 
two of each three vocabulary utterances were correctly recog- 
nized, the utterance was considered to be properly trained. 

If not, that vocabulary word was then retrained and again 
checked for accuracy. On the average each subject retrained 
five utterances (three being the least number retrained and 
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nine the highest) , with phonetically similar expressions such 
as HARM/ARM, with/list, back/track/attack and dive/five caus- 
ing the most difficulty. 

D. CONDUCTING THE EXPERIMENT 

For the experiment the author had compiled a list of 20 
basic WES commands for the CUBA scenario. These 20 commands 
(Appendix E) totalled 272 voice utterances and used 67 of the 
162 vocabulary utterances considered necessary to run an 
actual WES war game. The author had further divided these 
20 commands into five shorter groups of four commands each 
(Appendix F) . The commands in these five groups were arranged 
so that each group would be of approximately the same length. 
(Those utterances in Appendices E and F which consisted of 
more than one word are highlighted as they were for the sub- 
jects during the experiment.) 

Each subject would be required to input the list of 20 
commands and the five shorter lists of commands by the three 
methods of typing, unbuffered voice and buffered voice. The 
order of the input methods and the lists of commands used 
was randomly varied from subject to subject to eliminate any 
bias. When inputting the short lists, whether by typing or 
voice the subjects were given a brief rest between each of 
the five lists. The use of the 20 command list and the group 
of five lists with breaks between each was designed to see 
whether fatigue, frustration, or the prospect of having a 
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long or short task ahead might have any relevance on the 
results of the different entry methods. 

The conceptual design of the experiment is shown in 
Figure 2. This is a three-facto r nested desig n with repeat ed 
measures over the tasks. Each subject is nested within only 
one of the levels of experience. 

Once each subject had finished training the T600 he met 
at a later time with the author to conduct the actual 
experiment. At this point the subjects were given a brief 
overview of what they would be doing along with a verbal set 
of instructions (Appendix G) . Since in some cases it had 
been several weeks since the initial voice training all the 
subjects were given a copy of the WES vocabulary in order to 
refresh their memories. In addition the subjects were pro- 
vided a list of practice commands (Appendix H) with which they 
were allowed to train until they felt at ease and confident 
with the use of the voice recognition system. 

After each subject felt satisfied with his practice the 
experiment was run. The entire list of 20 commands and the 
five groups of commands, depending on the order used, were 
entered into the WES game via the three different input 
methods. While using the voice recognition modes* if an 
utterance was misrecognized four consecutive times or an ab- 
normally large number of times throughout the experiment, the 
author stopped the clock and had the subject retrain that 
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utterance rather than continue to struggle against the 
system. This was done on six occasions. The results of the 
experiment are contained in Section IV. 
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IV. PRESENTATION OF DATA 



A. DATA COLLECTION TECHNIQUES 

During the typing and the unbuffered voice modes of the 
experiment the Miniterm was used to keep a typescript of all 
commands entered by the subjects and the responses of the 
WES game. During the buffered voice mode the Miniterm was 
not used since the only commands which would have been printed 
were those already corrected by the subject and sent contain- 
ing no errors from the internal buffer. Instead the author 
manually recorded errors during this phase. 

The following measures of performance were recorded during 
all the trials: 1) the time required to complete a specific 

scenario, and 2) the number of input command errors. Input 
errors were divided into two types, recognition errors and 
operator errors. Recognition errors were those encountered 
when the T600 "thought" the subject said one thing but he 
had actually said another. This type of error was not 
applicable to the typing mode. An operator error was any 
other type of error committed which was not attributable to 
the T600 (e.g., a typing mistake, the operator forgetting 
to say "space" after a number, the operator saying "for" 

(and having it recognized as "4") rather than "for the," 
etc. ) . 
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In analyzing the data the author was interested in the 
actual number of errors committed. Therefore every single 
error was counted as a separate error. For example, if the 
subject made one typing error, or had one voice utterance 
misrecognized during a command, this was counted as one error. 
However, if the subject committed two typing errors in the 
same command before entering the command, this was counted as 
two errors although they only invalidated a single command. 

B. GENERAL RESULTS 

As noted earlier, each set of 20 voice commands contained 
272 voice utterances. Each subject was required to input the 
total 20 commands four different times by voice (i.e., the 
list of 20 commands by buffered and unbuffered voice, and 
the five groups of four commands in the same manner) . There- 
fore, if no voice errors had been committed, the twelve sub- 
jects would have inputted a total of 13,056 voice utterances 
during the experiment. However, the occurrence of both 
recognition and operator errors, and having to reenter the 
commands which contained these errors, resulted in a some- 
what greater number of voice utterances for the experiment. 
(The author did not physically count this total number.) 

There were 982 recognition errors recorded during the 
experiment. 

After analyzing the typescript from the unbuffered voice 
portion of the experiment, it was found that of the 67 
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utterances used to form the 20 WES commands, 46 of these 
utterances had been misrecognized at least once for some other 
vocabulary utterance. Twenty-one of the utterances were 
never misrecognized by the T600. In the buffered mode only 
the numbers of recognition errors were recorded rather than 
the misrecognized words since the author was not able to 
keep an accurate record of these. 

There were more total errors with each of the voice input 
modes than with the typing mode. The following data were 
found when looking at total number of errors (recognition 
errors + operator errors): typing, 169 total errors; buffered 

voice, 542; and unbuffered voice, 701. These figures show 
that the typing mode had 68.8 percent fewer total errors than 
did the buffered voice mode, and 75.9 percent fewer errors 
than the unbuffered voice mode. 

All of the subjects, regardless of typing ability, had 
been inputting data via a keyboard for at least five quarters 
while at NPS, while only six were considered experienced in 
voice entry. In addition, subjects seemed to try to be quite 
precise while typing at the keyboard where they had total 
control over any errors committed as opposed to voice input 
where the T600 might not recognize their utterance. 

As far as time was concerned, the total time required 
for all the subjects' typing inputs was 254.35 minutes, 

286.17 minutes for buffered voice and 585.7 minutes for un- 
buffered voice. Therefore typing was 11.1 percent faster 
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than buffered voice, and 56.6 percent faster than unbuffered 
voice input. 

C. RESULTS FOR SCENARIO TIMES 

Table 1 shows the time in minutes required for each sub- 
ject to input the list of 20 commands by the three entry 
methods, and Table 2 shows this data for the five groups of 
commands. An analysis of variance [14] was performed on this 
time data and Table 3 gives the statistical results. (The 
task of inputting either the 20 commands or the five groups 
of commands is hereafter referred to as the Task Type.) 

Table 3 shows that there was a statistically significant 
difference (at the a = .10 level) in time for experience 
level, as can be seen in Figure 3. (An a level of .10, for 
example, means that there is only a 10 percent chance or 
less that it is wrong to say there was a significant differ- 
ence in certain conditions.) The experienced subjects were 
able to input the commands faster via all three entry methods, 
and most noticeably by unbuffered voice where the average 
time climbed most steeply for the inexperienced subjects. 

Table 3 also shows that there was a significant difference 
(a = .01) in time for entry method. A range test [15] showed 
that there was a significant improvement in time with both 
typing and buffered voice over unbuffered voice, and that 
there was no difference between typing and buffered voice as 
far as time is concerned. These results are shown in Figure 
4. 
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Table 1. Time for 20 Commands 



TYPE 


UNBUFFERED 

VOICE 


BUFFERED 

VOICE 


11.97 


20.05 


12.80 


5.55 


20.22 


16.62 


7.42 


12.35 


7.52 


20.37 


28.77 


11.72 


9.33 


15.00 


10.43 


9.43 


6.80 


7.82 


15.40 


40.32 


14.40 


8.67 


76.88 


15.82 


9.47 


18.80 


9.40 


12.67 


20.57 


13.13 


9.32 


11.78 


10.00 


11.15 


36.40 


10.80 
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Table 2. Time for Five Groups 
of 4 Commands Each 



TYPE 


UNBUFFERED 

VOICE 


BUFFERED 

VOICE 


10.27 


22.40 


11.73 


8.80 


20.97 


12.88 


9.65 


10.32 


9.23 


14.88 


18.03 


9.62 


8.10 


11.78 


9.05 


10.27 


10.48 


8.22 


11.35 


44.23 


16.98 


8.07 


56.18 


17.05 


9.42 


23.22 


11.28 


12.95 


15.20 


10.57 


8.52 


21.85 


14.85 


11.32 


23.10 


13.85 
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Table 3. Analysis of Variance for Time 



SOURCE df 

Between Subjects 11 

EL (experience level) 1 

Error^ 10 

Within Subjects 60 

EM (entry method) 2 

TT (task type) 1 

EL x EM 2 

EL x TT 1 

EM x TT 2 

EL X EM X TT 2 

Error ^ 20 

Erro^ 10 

Error ^ 20 



MS 

700.1282 

182.8630 



1392.5246 

15.0152 

432.8107 

.0612 

18.8148 

3.0497 

127.6252 

18.5524 

14.34 



F 



3.8287* 



10.9110** 

.8093 

3.3912* 

.0033 

1.312 

.2126 



* p<. 10 

**p<. 01 

df: degrees of freedom 

MS : Mean Square 

F: F test ratio 
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Figure 3. Average Time for Different Entry Methods 
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There was also significant experience level-by-entry method 
interaction shown in Table 3. This is shown in both Figures 
3 and 4 and was due mainly to the effect of the inexperienced 
subjects with unbuffered voice where the average time increased 
much more quickly than it did for the experienced subjects. 

Table 3 further shows that there was no difference in the 
two task types with respect to time. There was also no other 
significant interaction shown. 

D. RESULTS FOR INPUT ERRORS 

1 . Recognition Errors 

The total number of recognition errors for each sub- 
ject in the two voice entry modes for 20 commands is given 
in Table 4. Table 5 shows this data for the five groups of 
commands. The results of the analysis of variance for this 
data are given in Table 6 . 

Table 6 shows that there was no significant difference in 
either experience level, entry method or task type with res- 
pect to recognition errors. Although it is not surprising 
that the entry method and the task type make no difference 
as far as recognition errors are concerned, it is somewhat 
surprising that experience level does not. The author would 
have thought the opposite to be true, with experienced sub- 
jects having significantly fewer recognition errors. 

Table 6 does, however, show a significant (a = .05) inter- 
action between entry method and task type as depicted in 
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Table 4 . Recognition Errors for 
20 Commands 



UNBUFFERED 

SUBJECT VOICE 



BUFFERED 

VOICE 



1 19 18 

2 24 62 

3 5 2 

4 28 8 



5 16 8 

6 2 3 

7 69 31 

8 81 26 



9 



9 



5 



10 6 6 

11 5 16 

12 29 20 
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Table 

Five 


5. Recognition Errors for 
Groups of 4 Commands Each 



SUBJECT 


UNBUFFERED BUFFERED 

VOICE VOICE 


1 


25 14 


2 


15 43 


3 


4 7 


4 


13 6 


5 


13 15 


6 


6 3 


7 


52 33 


8 


70 24 


9 


17 15 


10 


5 7 


11 


16 43 


12 


14 24 
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Table 6. Analysis of Variance for 
Recognition Errors 



SOURCE 


df 


MS 


F 


Between Subjects 


11 






EL (experience level) 


1 


1452 


1.5332 


Error 


10 


946.9916 




Within Subjects 


36 






EM (entry method) 


1 


225.3333 


.5114 


TT (task type) 


1 


4.0833 


.0510 


EL x EM 


1 


420.0833 


.9535 


EL x TT 


1 


48 


.60 


EM x TT 


1 


108 


5.1695 


EL x EM x TT 


1 


80.0834 


3.8332 


Error^ 


10 


440.5583 




Error^ 


10 


79.9916 




Erro^ 


10 


20.8916 





* P< . 05 
**P< . 10 

df: degrees of freedom 

MS : Mean Square 

F: F test ratio 
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Figure 5. Although the average number of recognition errors 
was greater for 20 commands than for the five groups in un- 
buffered voice, the opposite was true for buffered voice. 
There is also a significant three-way interaction shown in 
Table 6 between experience level, entry method and task type. 
This interaction is shown in Figure 6 . 

2 . Operator Errors 

Operator errors were all errors other than those 
caused by the T600 voice recognition unit. This included 
such things as typing and spelling errors in the typing mode, 
and basically forgetting the various ground rules, and there- 
fore causing mistakes, while using the voice modes. Table 7 
shows the number of operator errors committed while inputting 
the list of 20 commands, and Table 8 gives this information 
for the groups of commands. Table 9 shows the results of the 
ANOVA performed on this data . 

Table 9 shows a statistically significant difference 
at the a = .05 level in operator errors for entry method. A 
range test showed a significant decrease in operator errors 
for buffered voice as compared to both unbuffered voice and 
typing. The range test showed no difference between the 
typing and unbuffered voice modes with respect to operator 
errors. This is shown in Figure 7 where buffered voice has 
fewer operator errors than the other input methods for both 
experience levels. 
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Figure 5. Average Number of Recognition Errors 
for Different Entry Methods 
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Figure 6. Average Number of Recognition Errors 
for Different Experience Levels 
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Table 7 . Operator Errors for 
20 Commands 



UNBUFFERED 

TYPE VOICE 



13 

1 

4 

6 

4 

10 

7 

9 

5 

4 
3 

5 



10 

9 

8 

15 

4 

1 

6 

11 

4 

2 

6 

3 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 



6 

7 

5 

6 

1 

3 

5 

5 

4 

2 

5 

5 



Table 8. Operator Errors for 
Five Groups of 4 Commands Each 



UNBUFFERED 

TYPE VOICE 



11 

10 

13 

6 

3 

16 

4 
9 
4 
9 
7 
6 



7 

12 

8 
5 
2 

10 

3 

8 

5 

3 

15 

1 
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Table 9. Analysis of Variance for 
Operator Errors 



SOURCE 


df 


MS 


F 


Between Subjects 


11 






EL (experience level) 


1 


46.7222 


1.8772 


Error, 

b 


10 


24.8888 




Within Subjects 


60 






EM (entry method) 


2 


52.0972 


5.3311 


TT (task type) 


1 


14.2222 


1.0314 


EL x EM 


2 


3.3472 


.3425 


EL x TT 


1 


.2223 


.0161 


EM x TT 


2 


8.5972 


1.3147 


EL X EM X TT 


2 


5.8472 


.8942 


Error^ 


20 


9.7722 




Erro^ 


10 


13.7888 




Error^ 


20 


6.5388 




* p< *05 








df : degrees of freedom 

MS : Mean Square 

F: F test ratio 
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Figure 7. Average Number of Operator Errors 
for Different Experience Levels 
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Table 9 shows that there is no significant difference 
in operator errors over either experience level or task type. 
There are also no significant interactions shown in the 
table . 

3 . Total Errors 

The total errors are the sum of the recognition and 
operator errors. The total number of errors for each subject 
is given in Table 10 for the task of entering 20 commands, and 
in Table 11 for the groups of commands. As for the other 
types of errors an analysis of variance was performed on this 
data, with the results presented in Table 12. 

The results of the ANOVA show a significant differ- 
ence in total errors for entry method. A range test showed 
a significant decrease in total errors for the typing mode 
when compared with both unbuffered and buffered voice. There 
was no significant difference between the two different voice 
input modes. This result is shown in Figure 8. IT MUST BE 
REMEMBERED, HOWEVER, THAT THE TYPING MODE DID NOT INCLUDE 



RECOGNITION ERRORS, WHEREAS THE TWO 


» VOICE MODES 


DID. 


THERE- 


FORE, FOR THE VOICE MODES TOTAL ERRORS ARE 


THE 


SUM OF 


OPERATOR 


AND RECOGNITION ERRORS, WHILE FOR TYPING TOTAL 


ERRORS 


ARE THE 


SAME AS OPERATOR ERRORS. THIS CAN 


BE SEEN 


BY COMPARING THE 


CURVES FOR TYPING IN FIGURES 7 AND 


8 WHICH 


SHOW TYPING WITH 


THE EXACT SAME TREND BECAUSE THERE 


COULD BE 


: NO 


VOICE 


RECOGNI- 



TION ERRORS UNDER THE TYPING METHOD. 
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Table 10. Total Errors for 
20 Commands 



UNBUFFERED 

TYPE VOICE 



13 

1 

4 

6 

4 

10 

7 

9 

5 

4 
3 

5 



29 

33 

13 

43 

20 

3 

75 

92 

13 

8 

11 

32 



60 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 



20 

50 

12 

12 

16 

6 

38 

29 

19 

9 

48 

29 



Table 11. Total Errors for 
Five Groups of 4 Commands Each 



UNBUFFERED 

TYPE VOICE 



11 


32 


10 


27 


13 


12 


6 


18 


3 


15 


16 


16 


4 


55 


9 


78 


4 


22 


9 


8 


7 


31 


6 


15 
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Table 12. Analysis of Variance 
for Total Errors 



SOURCE 

Between Subjects 

EL (experience level) 

Error, 

b 

Within Subjects 
EM (entry method) 

TT (task type) 

EL x EM 
EL x TT 
EM x TT 
EL x EM x TT 
Error^ 

Error 2 

Error^ 

* p <.01 

df: degrees of freedom 

MS: Mean Square 

F: F test ratio 



MS 


F 






589.3889 

760.5722 


.7749 



3107.1805 


7.6051 


4.5 


.0524 


442.1805 


1.0822 


26.8888 


.3136 


75.5416 


1.8751 


66.2639 


1.6448 



408.5638 

85.7277 

40.2861 



df 

11 

1 

10 

60 

2 

1 

2 

1 

2 

2 

20 

10 

20 
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AVERAGE NO. TOTAL ERRORS/SUBJECT 





Figure 8 . Average Number of Total Errors 
for Different Experience Levels 
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Table 12 shows that there is no significant difference 
in total errors over either experience level or task type. 

In addition, there are no significant interactions in the 
area of total errors . 
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V. CONCLUSIONS AND RECOMMENDATIONS 



A. EXPERIMENTAL CONCLUSIONS 

Based on the results of this experiment, twelve test 
subjects were able to input WES commands to the war game 
faster and with fewer total errors using the manual typing 
input mode than with two voice input modes. Experienced 
voice subjects input the commands faster than the inexperienced 
subjects, but experience level made no difference as far as 
the total number of errors committed was concerned. Typing 
was significantly better as far as total errors, but there 
was no statistical difference between typing and buffered 
voice modes as far as time was concerned. Finally, for time 
and total errors, it made no difference which of the two task 
types was being performed. 

The results suggest that manual input is certainly supe- 
rior to unbuffered voice, and in some respects to buffered 
voice input in this experiment. However, the author feels 
that this must be qualified by looking at the unique situa- 
tion in which the input methods were being used. WES com- 
mands are very formatted and must be entered with no errors. 
This requirement caused many commands to be rejected and 
resulted in the definite infeasibility of using unbuffered 
voice input with WES. It simply took too long and resulted 
in too many errors. The buffered voice mode held its own 
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with typing when considering input time, and it was actually 
better than typing for operator errors. 

In a task such as running WES, when the goal must be 
perfection in entering all player commands in order to 
actually play the game, if the time required for two differ- 
ent input methods is the same, then it appears that their 
error rates are insignificant. In this experiment there was 
no statistical difference in time for the typing and buffered 
voice input methods, so the fact that buffered voice had more 
total errors really makes no difference. The lists of com- 
mands were input and accepted by WES in the same amount of 
time regardless of errors. 

There are also possible intangible benefits associated 
with the use of voice input to a computer, whatever its pur- 
pose might be. One such benefit might be the ability of 
supervisors or commanders to hear what is being told to or 
asked of a computer while they are still engaged in other 
activities. This would eliminate several people leaning 
over the shoulder of the operator trying to see what he is 
typing into the computer, allowing the operator to perform 
his job more easily and probably increasing the total effi- 
ciency in the work area. 

B. RECOMMENDATIONS FOR FURTHER STUDY 

Voice recognition, although very promising in many fields, 
certainly is not the panacea in all areas of input to 
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computers. This can be seen by the unbuffered voice results 

with WES. This should not, however, slow down the research 

being done in the field of voice recognition. Studies cited 

earlier point out very promising uses of voice recognition. 

The author believes that further research should be done, 

using the buffered voice mode, during WES games to test its 

validity in actual use. This could be done quite easily as 

3 

thesis research work at NPS , in the C laboratory course at 
NPS, or in conjunction with scheduled war games involving 
NPS, CINCPACFLT and NOSC . 

In this experiment the subjects were divided into expe- 
rienced and inexperienced groups as far as voice recognition 
systems were concerned. However, the fact that the subjects 
were not experienced with WES was never taken into account. 
Another possible experimental factor might be to compare the 
results of experienced and inexperienced WES users. Although 
increasing the variables like this would make it more diffi- 
cult to find the required number of subjects, this could be 
done at NPS in the curriculum where the students take 
almost all of the same classes for six quarters. 

Further research also should be done in the NPS RSM, 
perhaps in conjunction with the WES games proposed above, to 
study the effects of background and ambient noise on the 
reliability of the voice recognition equipment. There will 
surely be this noise problem in any operational use of voice 
equipment in a command center, CIC or aircraft, and this 
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could easily be simulated by introducing noise while using 
the equipment at NPS . 

Research into the possible uses of voice recognition 
equipment in aircraft, intelligence, war gaming and other 
operational uses is presently ongoing at NPS. These efforts 
will result in much new information on the uses and drawbacks 
of automatic voice recognition. Truly operational, rather 
than merely scholarly and scientific study in this field 
must be continued if we are to reap any benefits from this 
new technology available today. This should be an ongoing 
endeavor at NPS, and in the curriculum particularly where 
there is such promise and demand for this type of technology 
today and in the future. 
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APPENDIX A 



VOICE STUDIES AT NPS 



This thesis is one of several voice recognition research 
projects conducted for Professor G. K. Poock at NPS over the 
last several years. The complete list, in addition to this 
thesis, includes: 

Armstrong, J. W. , The Effects Of Concurrent Motor Tasking On 
Performance Of A Voice Recognition System , Masters Thesis , 
Naval Postgraduate School, Monterey, 1980. 

Batchellor, M. P., Investigation Of Parameters Affecting Voice 
Recognition Systems In C^ Systems , Masters Thesis, Naval 
Postgraduate School, Monterey, 1981. 

Bragaw, P. H., Investigation Of Voice Input For Constructing 
Joint Chiefs Of Staff Emergency Action Messages , Masters 
Thesis, Naval Postgraduate School, Monterey, 1981. 

Jay, G. T., An Experiment In Voice Data Entry for Imagery 
Intelligence Reporting, Masters Thesis, Naval Postgraduate 
School, Monterey, 1981. 

Naval Postgraduate School Report NPS54-80-010 , The Effects 
Of Certain Background Noises On The Performance Of A Voice 
Recognition System , by R. Elster, September 1980. 

Naval Postgraduate School Report NPS55-80-016 , Experiments 
With Voice Input For Command And Control: Using Voice Input 
To Operate A Distributed Computer Network , by G. K. Poock, 
April 1980. 

Naval Postgraduate School Report NPS55-81-003 , Examination Of 
Voice Recognition System To Function In A Bilingual Mode , by 
D. E. Neil and T. Andreason, February 1981. 

Taggart, J. L. and Wolfe, C. D., Speech Recognition As An 
Input Medium For Preflight In The P3C Aircraft , Masters 
Thesis, Naval Postgraduate School, Monterey, 1981 . 
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APPENDIX B 





WES VOCABULARY 


A. BASIC WES 


WORDS 


one 


two 


three 


four 


five 


six 


seven 


eight 


nine 


zero 


a 


s 


e 


air 


all 


altitude 


at 


attack 


back 


barrier 


bearing 


bingo 


blue 


by 


cancel 


carriage return 


course 


cover 


degrees 


delay 


designate 


distance 


dive 


drop 


east 


end 


enemy 


envelope 


execute 


ex sup 



find distance from fire 
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fire a 



for 



forces 


friendly 


go 


guide 


heading 


help 


if attacked 


kill line 


kill word 


label 


launch 


lay a barrier from 


lay a minefield from 


list 


maneuver delay 


map 


minefield 


minutes 


name 


neutral 


north 


now 


of 


off 


on 


on contact 


orange 


orders 


other 


own 


pass control of 


place 


place a circle 


place a marker 


player 


plot 


point 


position 


pounds from 


probability 


probability of detection 


proceed 


refuel 


report 


self 


send it 


sensor delay 


south 


space 


speed 
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station 



submarine 



surface 


target 


time 


to 


track 


unknown 


using 


west 


what is the 


with 


B. SCENARIO SPECIFIC WES WORDS 


A181 


A182 


A183 


A6E1 


A6E2 


ALR59 


ARM 


ASMD 


ASROC 


AWG9 


BERKELEY 


Bluel 


BPS14 


BQQ3 


CBU24 


E2C 


EA3 


EA6B 


ENTERPRISE 


ESM 


F14B 


for BERKELEY 


for ENTERPRISE 


for STURGEON 


G554 


HARM 


Harpoon 


KA6D 


Maverick 


MK46 


MK48 


MK82 


MK83 


MK84 
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Phoenix 



Phoenix 


Redeye 


RF18B 


Sea Sparrow 


Sidewinder 


SLQ17 


SLQ32 


Sonobuoy Active 


Sonobuoy Passive 


Sparrow 


SPN43 


SPS10 


SPS40 


SPS48 


SPS49 


SQS23 


STURGEON 


Tartar2 


Tomahawk 


Walleye 


Walleye2 


WLR6 
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APPENDIX C 



TEST SUBJECTS' BACKGROUNDS 



(wpm) 

Subject Service Sex Position Voice Experience Typing Ability 



1 


USAF 


M 


student 


experienced 


32 


2 


USN 


F 


student 


experienced 


59 


3 


USAF 


M 


student 


experienced 


46 


4 


USN 


M 


student 


experienced 


17 


5 


USN 


F 


faculty 


extensive 


34 


6 


Civ 


M 


faculty 


extensive 


39 


7 


USN 


M 


student 


minimal 


21 


8 


USAF 


M 


student 


minimal 


39 


9 


USN 


M 


student 


minimal 


38 


10 


USAF 


M 


student 


minimal 


37 


11 


USA 


M 


student 


minimal 


37 


12 


USN 


M 


student 


minimal 


26 
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APPENDIX D 



TYPING ABILITY TEST 

Because they have often learned to know types of archi- 
tecture by decoration, casual observers sometimes fail to 
realize that the significant part of a structure is not the 
ornamentation but the body itself. Architecture, because of 
its close contact with human lives, is peculiarly and in- 
timately governed by climate. For instance, a home built for 
comfort in the cold and snow of the northern areas of this 
country would be unbearably warm in a country with weather 
such as that of Cuba. A Cuban house, with its open court, 
would prove impossible to heat in a northern winter. 

Since the purpose of architecture is the construction of 
shelters in which human beings may carry on their numerous 
activities, the designer must consider not only climatic con- 
ditions, but also the function of a building. Thus, although 
the climate of a certain locality requires that an auditorium 
and a hospital have several features in common, the purposes 
for which they will be used demand some difference in struc- 
ture. For centuries builders have first complied with these 
two requirements and later added whatever ornamentation they 
wished. Logically, we should see as more additions, not as 
basic parts, the details by which we identify architecture. 
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wpm (errors allowed) 





1st 


2nd 




typing of 


the exerc 


Line Number 


(5 minutes 


maximum) 


1 


2 ( ) 


52(7) 


2 


5 ( ) 


54(7) 


3 


7 ( ) 


56(8) 


4 


9 ( ) 


59(8) 


5 


12 ( ) 


61(9) 


6 


14 ( ) 


64 (9) 


7 


16 ( ) 


66 (10) 


8 


18 ( ) 


68(10) 


9 


21 ( ) 


71(11) 


10 


23 ( ) 


73(11) 


11 


26 ( ) 


76 (12) 


12 


28 ( ) 


78 (12) 


13 


30 ( ) 


80 (12) 


14 


33 ( ) 


— 


15 


35 ( ) 


— 


16 


38 ( ) 


— 


17 


40(3) 


— 


18 


42(4) 


— 


19 


44(5) 


— 


20 


47(6) 


— 


21 


49(6) 


— 
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APPENDIX E 



LIST OF 20 WES COMMANDS 

1. For Enterprise launch 2 F14B course 090 altitude 
15000 bingo 999 name 1F14B. 

2. For Berkeley attack enemy surface on contact using G554. 

3. Find distance from Enterprise to 42N 57W. 

4. For Sturgeon course 090 speed 15. 

5. Place a circle Enterprise 150 time 15 999. 

6. For Sturgeon report all surface using BQQ3 . 

7. For Berkeley fire a harpoon target enemy surface 
sensor delay 2 heading 120. 

8. Pass control of 1F14B to Bluel. 

9. For 1F14B lay a minefield from 26N 42W bearing 
135 distance 10 using MK82. 

10. Place a marker 57N 71W time 23 300. 

11. For 1F14B proceed course 215 distance 115. 

12. For Berkeley station bearing 000 distance 3 
guide Enterprise. 

13. For Sturgeon attack enemy submarine on contact 
using MK48. 

14. For 1F14B altitude 20000 speed 600 course 090. 

15. For Enterprise report all air using SPS49 
time 00 999. 
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16. For 1F14B attack enemy air on contact using Phoenix. 

17. For Enterprise launch 1 KA6D course 000 
altitude 10000 bingo 120 name 1KA6D . 

18. Plot all surface Enterprise 100. 

19. For Berkeley report enemy forces using SLQ32 
time 00 120. 

20. For 1F14B refuel 6000 pounds from 1KA6D. 
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APPENDIX F 



FIVE LISTS OF FOUR WES COMMANDS 



I. 

1. For Berkeley attack enemy surface on contact using G554. 

2. Find distance from Enterprise to 42N 57W. 

3. For Berkeley fire a harpoon target enemy surface 
sensor delay 2 heading 120. 

4. Plot all surface Enterprise 100. 

II. 

1. For Sturgeon course 090 speed 15. 

2. Place a circle Enterprise 150 time 15 999. 

3. For Enterprise launch 2 F14B course 090 
altitude 15000 bingo 999 name 1F14B. 

4. For Berkeley report enemy forces using SLQ32 
time 00 120. 

III. 

1. Pass control of 1F14B to Bluel. 

2. Place a marker 57N 71W time 23 300. 

3. For Berkeley station bearing 000 distance 3 
guide Enterprise. 

4. For Enterprise launch 1 KA6D course 000 
altitude 10000 bingo 120 name 1KA6D. 
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IV. 



1. For Sturgeon report all surface using BQQ3 . 

2. For 1F14B proceed course 215 distance 115. 

3 . For Sturgeon attack enemy submarine on contact 
using MK48. 

4. For 1F14B attack enemy air on contact using Phoenix. 

V. 

1. For 1F14B lay a minefield from 26N 42W 
bearing 135 distance 10 using MK82. 

2. For Enterprise report all air using SPS49 
time 00 999. 

3. For 1KA6D altitude 20000 speed 600 course 090. 

4. For 1F14B refuel 6000 pounds from 1KA6D . 
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APPENDIX G 



INSTRUCTIONS FOR SUBJECTS 



You will be inputting a set of 20 commands and five sets 
of four commands each to the WES game by typing, unbuf- 
fered and buffered voice. 

If you make a mistake in either typing or unbuffered modes, 
carriage return right away to save time since it can't be 
corrected. Then reenter the command correctly. 

In the buffered mode you can use kill word or kill line 
to make changes before entering your commands . 

Input the commands as quickly as possible since you are 
being timed, but they must also be 100 percent accurate 
and accepted by WES . 

Remember to input a "space" after numbers you enter. All 
words automatically have a space with them. 

Remember that the words "for" and "to" were trained as 
"for the" and "to the" to differentiate them from the 
numbers 4 and 2. If you forget "the," the utterance will 
be recognized as the number. 

All phrases which were trained as a single utterance 
(e.g., pass control of) are highlighted in yellow so you 
won't have to try to remember the phrases. Remember to 
speak them as a single utterance. 
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Ensure the microphone is correctly positioned and if it 
moves stop and reposition it. 

The green READY light must be on for the T600 to accept 
your utterance. Allow a short pause between each utter- 
ance for it to come back on. 

Use of a forceful tone of voice produces the best results, 
and try not to draw out the utterance by a breathing noise 
at the end. 
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APPENDIX H 



PRACTICE VOICE COMMANDS 



1. For E2C lay a barrier from 36N 76W bearing 180 
distance 100 using sonobuoy passive . 

2. For 1F14B bingo. 

3. For EA3 proceed position 27N 183E. 

4. For 1F14B speed 1200 course 090 altitude 10000. 

5. Designate Enterprise 77.1. 
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